75 research outputs found
Recommended from our members
Explaining Data Patterns using Knowledge from the Web of Data
Knowledge Discovery (KD) is a long-tradition field aiming at developing methodologies to detect hidden patterns and regularities in large datasets, using techniques from a wide range of domains, such as statistics, machine learning, pattern recognition or data visualisation. In most real world contexts, the interpretation and explanation of the discovered patterns is left to human experts, whose work is to use their background knowledge to analyse, refine and make the patterns understandable for the intended purpose. Explaining patterns is therefore an intensive and time-consuming process, where parts of the knowledge can remain unrevealed, especially when the experts lack some of the required background knowledge.
In this thesis, we investigate the hypothesis that such interpretation process can be facilitated by introducing background knowledge from the Web of (Linked) Data. In the last decade, many areas started publishing and sharing their domain-specific knowledge in the form of structured data, with the objective of encouraging information sharing, reuse and discovery. With a constantly increasing amount of shared and connected knowledge, we thus assume that the process of explaining patterns can become easier, faster, and more automated.
To demonstrate this, we developed Dedalo, a framework that automatically provides explanations to patterns of data using the background knowledge extracted from the Web of Data. We studied the elements required for a piece of information to be considered an explanation, identified the best strategies to automatically find the right piece of information in the Web of Data, and designed a process able to produce explanations to a given pattern using the background knowledge autonomously collected from the Web of Data.
The final evaluation of Dedalo involved users within an empirical study based on a real-world scenario. We demonstrated that the explanation process is complex when not being familiar with the domain of usage, but also that this can be considerably simplified when using the Web of Data as a source of background knowledge
Explaining data patterns using background knowledge from Linked Data
When using data mining to find regularities in data, the obtained results (or patterns) need to be interpreted. The explanation of such patterns is achieved using the background knowledge which might be scattered among different sources. This intensive process is usually committed to the experts in the domain. With the rise of Linked Data and the increasing number of connected datasets, we assume that the access to this knowledge can be easier, faster and more automated. This PhD research aims to demonstrate whether Linked Data can be used to provide the background knowledge for pattern interpretation and how
Recommended from our members
Explaining clusters with inductive logic programming and linked data
Knowledge Discovery consists in discovering hidden regularities in large amounts of data using data mining techniques. The obtained patterns require an interpretation that is usually achieved using some background knowledge given by experts from several domains. On the other hand, the rise of Linked Data has increased the number of connected cross-disciplinary knowledge, in the form of RDF datasets, classes and relationships. Here we show how Linked Data can be used in an Inductive Logic Programming process, where they provide background knowledge for finding hypotheses regarding the unrevealed connections between items of a cluster. By using an example with clusters of books, we show how different Linked Data sources can be used to automatically generate rules giving an underlying explanation to such clusters
Recommended from our members
Update of time-invalid information in Knowledge Bases through Mobile Agents
In this paper, we investigate the use of a mobile, autonomous agent to update knowledge bases containing statements that lose validity with time. This constitutes a key issue in terms of knowledge acquisition and representation, because dynamic data need to be constantly re-evaluated to allow reasoning. We focus on the way to represent the time- validity of statements in a knowledge base, and on the use of a mobile agent to update time-invalid statements while planning for “information freshness” as the main objective. We propose to use Semantic Web standards, namely the RDF model and the SPARQL query language, to represent time-validity of information and decide how long this will be considered valid. Using such a representation, a plan is created for the agent to update the knowledge, focusing mostly on guaranteeing the time-validity of the information collected. To show the feasibility of our approach and discuss its limitations, we test its implementation on scenarios in the working environment of our research lab, where an autonomous robot is used to sense temperature, humidity, wifi signal and number of people on demand, updating the knowledge base with time- valid information
Recommended from our members
DKA-robo: dynamically updating time-invalid knowledge bases using robots
In this paper we present the DKA-robo framework, where a mobile agent is used to update those statements of a knowledge base that have lost validity in time. Managing the dynamic information of knowledge bases constitutes a key issue in many real-world scenarios, because constantly reevaluating data requires efforts in terms of knowledge acquisition and representation. Our solution to such a problem is to use RDF and SPARQL to represent and manage the time-validity of information, combined with an agent acting as a mobile sensor which updates the outdated statements in the knowledge base, therefore always guaranteeing time-valid results against user queries. This demo shows the implementation of our approach in the working environment of our research lab, where a robot is used to sense temperature, humidity, wifi- signal and number of people on demand, updating the lab knowledge base with time-valid information
Using Linked Data traversal to label academic communities
In this paper we exploit knowledge from Linked Data to ease the process of analysing scholarly data. In the last years, many techniques have been presented with the aim of analysing such data and revealing new, unrevealed knowledge, generally presented in the form of “patterns”. How-ever, the discovered patterns often still require human interpretation to be further exploited, which might be a time and energy consuming process. Our idea is that the knowledge shared within Linked Data can actuality help and ease the process of interpreting these patterns. In practice, we show how research communities obtained through standard network analytics techniques can be made more understand- able through exploiting the knowledge contained in Linked Data. To this end, we apply our system Dedalo that, by performing a simple Linked Data traversal, is able to automatically label clusters of words, corresponding to topics of the different communities
Towards the Temporal Streaming of Graph Data on Distributed Ledgers
We present our work-in-progress on handling temporal RDF graph data using the Ethereum distributed ledger. The motivation for this work are scenarios where multiple distributed consumers of streamed data may need or wish to verify that data has not been tampered with since it was generated – for example, if the data describes something which can be or has been sold, such as domestically-generated electricity. We describe a system in which temporal annotations, and information suitable to validate a given dataset, are stored on a distributed ledger, alongside the results of fixed SPARQL queries executed at the time of data storage. The model adopted implements a graph-based form of temporal RDF, in which time intervals are represented by named graphs corresponding to ledger entries. We conclude by discussing evaluation, what remains to be implemented, and future directions
Recommended from our members
Learning to Assess Linked Data Relationships Using Genetic Programming
The goal of this work is to learn a measure supporting the detection of strong relationships between Linked Data entities. Such relationships can be represented as paths of entities and properties, and can be obtained through a blind graph search process traversing Linked Data. The challenge here is therefore the design of a cost-function that is able to detect the strongest relationship between two given entities, by objectively assessing the value of a given path. To achieve this, we use a Genetic Programming approach in a supervised learning method to generate path evaluation functions that compare well with human evaluations. We show how such a cost-function can be generated only using basic topological features of the nodes of the paths as they are being traversed (i.e. without knowledge of the whole graph), and how it can be improved through introducing a very small amount of knowledge about the vocabularies of the properties that connect nodes in the graph
Recommended from our members
Walking Linked Data: a graph traversal approach to explain clusters
Link traversal is one of the biggest advantages of Linked Data, as it allows the serendipitous discovery of new knowledge thanks to the natural connections between data of different sources. Our general problem is to understand how such a property can benefit the Knowledge Discovery process: in particular, we aim at using Linked Data to explain the patterns of data that have been extracted from a typical data min- ing process such as clustering. The strategy we propose here is Linked Data traversal, in which we explore and build on-the-fly an unknown Linked Data graph by simply deferencing entities’ URIs until we find, by following the links between entities, a valid explanation to our clusters. The experiments section gives an insight into the performance of such an approach, in terms of time and scalability, and show how the links easily gather knowledge from different data sources
- …